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Abstract 


This paper reviews the impact of data science and artificial intelligence (AI) on future ‘data- 
driven’ insurance markets. The impact of insurance automation (driven by so-called Black 
Swan! events such as Covid-19) mirrors the impact of algorithmic trading that changed 
radically the capital markets (Koshiyama ef a/., 2020). The data science technologies driving 


Article Info change include: Big data, AI analytics, Internet of Things, and Blockchain technologies. 
These technologies are important since they underpin the automation of the insurance 
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1. Introduction 


1.1. Data-driven insurance industry 


Data science technologies and artificial intelligence (AI) is revolutionizing the insurance marketplace and creating a new 
generation of InsurTech companies that are ‘data-driven’ (cf. Amazon). In China Zhong An (a digital insurance 
collaboration of Alibaba, Tencent and Ping An) underwrote over 630 million insurance policies in its first year of 
operation (The Digital Insurer forum). So-called “Black Swan’ events such as Covid-19 will clearly accelerate change in 
how we work, and buy products and services. The insurance industry already collects huge volumes of data and thus 
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has major opportunities for utilizing machine learning to uncover hidden patterns, unknown correlations, customer 
preferences and risk. At one end, this allows precise risk profiling on institutions and individuals and at the other, 
building complex (ecology) models for long-term prediction such as natural disasters. 


For the Insurance industry, the data science technologies provide, on the one hand, unprecedented volumes of data 
and analytics tools for analysis, but on the other ‘revolutionary’ innovations present new business challenges. To 
understand the opportunities offered by ‘data-driven’ insurance, it is necessary to understand the data science 
technologies contributing to the ‘perfect storm’. 


1.2. How AI will transform the insurance sector 


In insurance, AJ-based analytics will help insurance companies to drive new business by allowing companies to 
personalize insurance products and better respond to existing clients’ needs. More accurate underwriting models and 
constant learning from new data will be reflected in premiums that are often overestimated because of lack of data. There 
is a number of ways to leverage machine learning techniques in the insurance industry, including: 


¢ Automated and Personalized product offerings. Automated processes have an important impact on the insurance 
industry, to analyze large portions of data and gain a more specific perspective of the client’s activities. Insurers can 
offer personalized products and solutions that are based on the specific needs of narrow segments. One example of 
AI in customer service is the Lemonade insurance app (Lemonade.com), which makes use of AI-powered Chatbots 
to assist its customers. AI Chatbot provides personalized policies to customers in less than two minutes. 


¢ Behavioral Product Pricing. Telematics, wearable sensors and smart watches (known as Internet of Things) provide 
a wealth of information for insurance companies to profile the client’s risk. AI algorithms can provide dynamically 
changing premiums based on health markers or driving behaviors, which reflect the risk profile of the insured. In this 
way, insurers can become more involved in their policyholders health or safe driving habits. 


¢ Improved Risk Assessment. When compared to humans, machine learning can deliver additional predictions. 
Insurance companies can construct targeted predictions on coverage changes, and possible losses for policies and 
manage risks more effectively using different sources of data. 


¢ Enhanced Fraud Detection. Fraud is one of the biggest problems for the insurance companies. FBI estimated around 
US$40 billion per year lost to insurance fraud (FBI.gov). Machine learning algorithms can reduce human errors and 
identify unobserved fraud patterns by identifying exceptions. Fraud detection solutions analyze massive amounts 
of data from multiple sources. AI powered scoring system then analyses each claim as to how likely fraudulent it 
might be. 


¢ Business Processes Automation. Al-powered systems may eliminate tedious, repetitive and mundane processes 
across the whole organization. AI-based products can help in analyzing complex documents and extracting vital 
information from them. 


The AI revolution will go even further leading to so-called Computable (insurance) Contracts (Marano and Noussia, 
2019). Computable contracts are legal specifications that a computer can read, understand, verify and execute; and 
therefore automate. 

Currently, Big Data technologies are applied to predict risks and claims, to monitor and to analyze them in order to 
develop effective strategies for customer’s attraction and retention. In Table 1, we listed examples of use cases where 
machine learning techniques are applied: 


Table 1: AI use cases in insurance 


Insurance area Use case 
Healthcare Insurance Fraud and abuse detection system (Ilker ef a/., 2015). 
Motor Insurance Fraud detection system — A multiple classifier system based on Random Forest, Principle 


Component Analysis and Potential Nearest Neighbor is proposed (Yaqi ef a/., 2018). 


Non-Life Reporting Multilayer Perceptron applied to the problem of the prediction of insolvency of non-life 


insurance companies, upon the basis of a set of financial ratios (Diaz ef al., 2005). 


Claims Reserving Regression trees used to calculate claims reserves on individual claims data (Wiithrich, 
2018). 
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1.3. Challenges for insurance industry under AI revolution 


Challenges with adopting AI can be grouped into two areas: (a) those that stem from algorithms (e.g., explainability/ 
interpretability, robustness, fairness), and (b) those related to the dynamic nature of AI. 


Below we unpack the second group that can be further divided into: a) technology, b) data strategy, c) a culture. 


¢ Technology stack/ lack of technical knowledge — to deploy and implement AI applications, organizations need the 
knowledge of the current AI advancements and technologies and their shortcomings. Adopted technologies need to 
be relevant to the business requirements as well as be flexible enough to enable integration of different data sources 
(both internal and external). AI solutions require high degree of computation speed that imposes further infrastructure 
requirements. 


¢ Data strategy — Big data obtained from customers, core systems, brokers need to be appropriately managed. AI may 
be working with different types of data (e.g., text, voice, video, image, and sensors - IoT). To fully leverage AI in the 
industry all available data sources should be utilized. Data may be structured, semi-structured and unstructured. 
Another classification of data can be into real-time, historical, third-party data. Pulling all those data sources together, 
managing data quality and creating value out of that can be a really challenging task for the industry. Also, insurance 
companies need an efficient infrastructure for controlling their data (data shouldn’t be disintermediated by brokers). 


¢ Aversion towards AI — both staffand customers need to be convinced that innovation that AI is utilized to empower 
decision-making that will benefit everyone (improved efficiency for staff, more accurate risk pricing and personalized 
products for customers). 

An accurate role model and illustration of future data-driven insurance is the introduction of algorithmic trading in 
the Capital Markets (Treleaven, et al., 2013) over the past 18 years. Notable is firstly the collection and analysis of 
increasingly huge volumes of real-time and historic data (e.g., financial, economic, social, alternative), and secondly the 
increasing use of AJ algorithms that can switch dynamically to respond to changes in the market microstructure (Koshiyama 
et al., 2020). Insurance companies will seek increasingly to collect and control their data; and also automate to interact 
directly with their clients. 


Next, we will unpack the new ‘disruptive’ technologies. 


2. Data science technologies driving change 
To provide context, our review divides the data science technologies into: 


¢ Data technologies — includes solutions for data management and collection, as well as services that are based on 
data generated by both human and machines. 

¢ Algorithm technologies — new forms of ‘statistics’, such as machine learning, computational statistics, and complex 
systems (e.g., deep neural networks, Monte Carlo simulation). 


Analytics technologies — covering the application of the data technologies (e.g., natural language, sentiment analysis 
and behavioral analysis). 


¢ Infrastructure technologies — providing the infrastructure for information management and automation (e.g., 
Blockchain- based digital marketplace, computable contracts). 
2.1. Data technologies 


¢ Big data- the collection and analysis of huge volumes of historic and real-time information (e.g., financial, economic, 
social media, alternative); 

« Cloud Computing- on-demand availability of computer system resources such as data storage or computing power. 

¢ Chatbots — data provided by computer programs that simulates human conversation through voice commands or 


text chats or both; using natural language processing (NLP) and sentiment analysis to understand the conversation. 


¢ Internet of Things (IoT) - the inter-networking of ‘smart’ physical devices, vehicles, buildings, etc. that enable these 
objects to collect and exchange data. 


2.2. Algorithm technologies 


For completeness, this section unpacks algorithms across three domains (Table 2): Computational Statistics (e.g., 
Monte Carlo methods), AI and ML (e.g., Artificial Neural Networks), and Complex Systems (e.g., Agent-Based systems). 
While there may be some debate over the terminology, we find the classification helpful to distinguish between relatively 
well-established methods and more cutting-edge technologies. 


M algorzata Smietanka et al. / Int.J.D ata.Sci. and Big D ata Anal. 1(1) (2021) 1-19 Page 4 of 19 


Table 2: Algorithm domains 


Computational statistics — Computationally intensive statistical methods. 

AI algorithms — Mimicking a new form of human learning, reasoning, knowledge, and decision-making. 
* Knowledge or rule-based systems 

* Evolutionary algorithms 


* Machine learning 


Complex systems — System featuring a large number of interacting components whose aggregate activity is nonlinear. 


2.2.1. Computational statistics 


Computational statistics models refers to computationally intensive statistical methods including Resampling methods 
(e.g., bootstrap and cross-validation), Monte Carlo methods, Kernel Density estimation and other semi and non-parametric 
methods, and generalized additive models (Efron and Hastie, 2016; Wood, 2017). Examples include: (a) Resampling 
methods — a variety of methods for doing one of the following: (i) estimating the precision of sample statistics using 
subsets of data (e.g., jack-knifing) or drawn randomly from a set of data points (e.g., bootstrapping); (i1) exchanging 
labels on data points when performing significance tests (e.g., permutation tests); (iii) validating models by using 
random subsets (e.g., repeated cross-validation); (b) Monte Carlo methods — a broad class of computational algorithms 
that rely on repeated random sampling to approximate integrals, particularly used to compute expected values including 
those meant for inference and estimation (e.g., Bayesian estimation, simulated method of moments); (c) Kernel Density 
estimation — are a set of methods used to approximate multivariate density functions from a set of data points; it is largely 
applied to generate smooth functions, reduce outliers effects and improve joint density estimations, sampling, and to 
derive non-linear fits; (d) Generalized Additive Models —a large class of nonlinear models widely used for inference and 
predictive modeling (e.g., time series forecasting, curve-fitting, etc.); and (e) Regularization methods — Regularization 
methods are increasingly used as an alternative to traditional hypothesis testing and criteria-based methods, for allowing 
better quality forecasts with a large number of features. 


2.2.2. Al and machine learning 


This AI continuum of epistemological models spans three main communities: (a) Knowledge-based or heuristic algorithms 
(e.g., rule-based) — where knowledge is explicitly represented as ontologies or IF-THEN rules rather than implicitly via 
code (Giarratano and Riley, 1998); (b) Evolutionary or metaheuristics algorithms — a family of algorithms for global 
optimization inspired by biological evolution, using population-based trial and error problem solvers with a metaheuristic 
or stochastic optimization character (e.g., Genetic Algorithms, Genetic Programming, etc.) (Poli e/ a/., 2008; Brownlee, 
2011); and (c) Machine learning algorithms —a type of AI program with the ability to learn without explicit programming, 
and can change when exposed to new data; mainly comprising Supervised (e.g., Support Vector Machines, Random 
Forest, etc.), Unsupervised (e.g., K-Means, Independent Component Analysis, etc.), and Reinforcement Learning (e.g., 
Q-Learning, Temporal Differences, Gradient Policy Search, etc.) (Hastie ef a/., 2009; Sutton and Barto, 2018). Russell and 
Norvig (2020) provide an in-depth view of different aspects of AI. 


2.2.3. Complex systems 


Lastly, a complex system is any system featuring a large number of interacting components (e.g., agents, processes, etc.) 
whose aggregate activity is nonlinear (not derivable from the summations of the activity of individual components) 
and typically exhibit hierarchical self-organization under selective pressures (Taylor, 2014; Barabasi, 2016). Examples 
include: (a) Cellular automata —a collection of cells arranged in a grid, such that each cell changes state as a function 
of time according to a defined set of rules that includes the states of neighboring cells; (b) Agent-based models —a class 
of computational models for simulating the actions and interactions of autonomous agents (individual or 
collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole; (c) 
Network-based models — a complex network is a graph (network) with non-trivial topological features-features that do 
not occur in simple networks such as lattices or random graphs but often occur in graphs modeling of real systems; and 
and (d) Multi-Agent systems — this subarea focus on formulating cooperative-competitive policies to a multitude of 
agents with the aim to achieve a given goal; this topic has significant overlap with Reinforcement Learning and Agent- 
based models. 
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Table 3: Landscape of algorithms and their applications 
NLP and Fraud Reporting/ Compliance/ 
sentiment analysis detection Actuarial modeling regulation 

Computational Statistics (Cambria et al., 2013) | (Juszcezak et al., (Baumer, 2000) (Yang and Koshiyama, 

2008) 2019) 

Machine Learning (Kolchyna (Adewumi and (Richman, 2018) (van Liebergen, 2017) 
et al., 2015) Akinyelu, 2017) 

Complex Systems (Batrinca and (Xu and Chen, (Allan et al., 2012) (May et al., 2008) 
Treleaven, 2015) 2005) 


As an illustration of this landscape of algorithms and research, Table 3 presents a non-exhaustive list of references 
that links each class of algorithms to applications in different areas of insurance. 
2.3. Analytics technologies 


Analytics technologies are used by many insurers to collect and predict customer behavior. 


2.3.1. Behavioural/Predictive Analytics 


Predictive Analytics is the analysis of large and varied data sets to uncover hidden patterns, unknown correlations, and 
customer preferences etc. to help make informed decisions. Predictive analytics is the practice of extracting information 
from historical and real-time data sets to determine patterns and predict future outcomes and trends. Predictive analytics 
‘forecasts’ what might happen in the future with an acceptable level of reliability, and includes what-if scenarios and risk 
assessment. 


Insurance companies can use predictive analytics for: (a) pricing and risk selection; (b) identifying risks (e.g., fraud, 
cancellation); and (c) identifying outlier claims. 
2.4. Infrastructure technologies 
For the Insurance industry key infrastructure technologies driving change are: (a) Federated learning, (b) Computable 
insurance contracts; and (c) Blockchain-based digital marketplaces. 
2.4.1, Federated learning 


Federated learning is an important emerging technique, given the value and sensitivity of data (e.g., financial, business, 
social, alternative and regulatory). With federated learning, the focus is on decentralized framework enabling multiple 
data holders to collaborate and to converge to a common model without exchanging raw data. 


We distinguish the following categories of Federated learning: 


¢ Horizontal Federated learning (HFL) — Assumes that datasets from different participants share the same feature 
space but may not share the same sample ID space (Figure 1). 


eee | 


1 
ioe} 
: ' oo 
Data from A ; z} Data from A 
SE i NR — RRR 
' \ 
4 if a j st Vertical Federated Learning 
10 Zl 
& 8 3e ee 
‘ (S54 4 
ae 1 Data from B Data from B Labels 
ie 
! ' 
, ' 
pe 
Features Features: 
* Large overlap of features of the two data sets * Large overlap of sample IDs (users) of the two data sets 


Figure 1: HFL vs. VFL 
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¢ Vertical Federated Learning (VFL) — participants share the same ID space but have different feature spaces, while 
label information is owned by one participant (see Figure 4). One possible use case is two e-commerce companies 
and a bank which collaborate on training a model to recommend personalized loans for users based on their online 
shopping behaviors through VFL (Yang e/ a/., 2019). This use case can easily be adopted to the insurance domain, 
where NHS collaborates with medical insurance providers to provide personalized medical insurance products. 


Open source FL architectures include Google’s TensorFlow Federated (https://www.tensorflow.org/federated), PySyft 
(https://github.com/OpenMined/PySyft) and Webank’s Federated AI Technology Enabler FATE (Li e/ a/., 2019). 


2.4.2. Computable insurance contracts 


Computer-readable and executable legal specifications are set to have a profound impact on business and legal services. 
So-called Computable contracts are legal specifications that a computer can read, understand, verify and execute, and 
therefore automate. The challenge is to specifying a contract that can be composed and read by professionals, and can 
also be translated into a domain-specific computer-readable specification such as XML (www.service-architecture.com/ 
articles/xml/insurance_xml.html). Here the big potential is automating high volume low cost domestic insurance (e.g., 
health, vehicle), without human intervention. Smart Contracts - are simply rules, possibly computer programs, which 
codify transactions and contracts with the intent that the records managed by the distributed ledger are authoritative 
with respect to the existence, status and evolution of the underlying legal agreements they represent. Smart contract 
technology has the potential to automate laws and statutes. 


2.4.2.1. Blockchain technologies 


The core technologies are: Distributed Ledger Technology (DLT) and smart contracts. DLT is a decentralized database 
where transactions are kept in a shared, replicated, synchronized, distributed bookkeeping record, which is secured by 
cryptographic sealing. The key distinction between “distributed ledgers’ and ‘distributed databases’ is that nodes of the 
distributed ledger cannot/do not trust other nodes—and so must independently verify transactions before applying 
them. 


2.4.3. Blockchain-based digital marketplaces 


With the Uberization and globalization of insurance services, a major opportunity exists to create an ‘Amazon/Alibaba’ 
for digital insurance services using Blockchain technology built on top of the Internet. Blockchain marketplace is under 
development in Singapore (www.sginnovate.com/events/future-blockchain-based-data-marketplaces-challenges-and- 
impact) and Dubai (www.smartdubai.ae/). However, the most comprehensive ‘digital’ Blockchain infrastructure program 
is Estonia’s e-Estonia (https://e-estonia.com/) where every citizen has a digital identity, digital signature and personal 
record, and virtually all government services are digital and online. 


In the next section we will classify and introduce machine learning algorithms. 


3. Machine learning paradigms 


The great computational strength of ML algorithms is their ability to ‘learn’ without explicit programming. Understanding 
computational ‘learning’ is likely to have a profound effect on future science, in both artificial and natural (biological) systems. 


As illustrated by examples in Table 4, the driving forces of new ML algorithms are broadly a combination of the 
classical trio of Supervised, Unsupervised and Reinforcement Learning, with the disruptors: Deep Learning, Adversarial 


Table 4: Algorithms emerging by interaction between different learning paradigms 


Supervised Unsupervised Reinforcement 
Deep Deep Convolutional Neural| Deep Autoencoders (Goodfellow et a/., | Deep Q-Learning, Trust Region Policy 
Learning Networks, Deep Recurrent Neural} 2016), Deep clustering (Caron ef a/., | Optimization, Asynchronous Advantage 
Networks (Goodfellow et al., 2016).} 2018). Actor Critic (Arulkumaran et al., 2017) 


Adversarial | Adversarial Semi-supervised Learning | Adversarial Autoencoders (Makhzani ef a/., | Adversarial Policies (Gleave ef al., 
Learning (Miyato et al., 2017); Adversarial| 2016); Adversarial Representation Learning | 2019); Robust Adversarial Reinforcement 
Robustness in Supervised Learning | (Chen ev a/., 2016), Generative Adversarial | Learning (Pinto ef a/., 2017) 


(Nicolae et al., 2019) Networks (Goodfellow et a/., 2014) 
Transfer/ |OPEN-GPT (Radford e¢ a/., n.d.);}Bayesian Unsupervised One-Shot | Darla (Higgins ef a/., 2017); Deep 
Meta BERT (Devlin ef al., 2018) ;}learning (Fei-Fei ef al., 2003);| Transfer Reinforcement Learning for 
Learning MedicalNet (Chen ef al., 2019) Embeddings from Language Model | Text Summarization (Keneshloo ef a/., 


(Siddhant et al., 2019) 2019) 
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Learning, Transfer and Meta Learning. This interaction constantly yields new models (e.g., Long Short-Term Memory 
(LSTMs), Generative Adversarial Networks (GANs)) and applications (e.g., Natural Language Processing, Object 
Recognition, Forecasting, etc.). 


3.1. Supervised, unsupervised, reinforcement 
ML firstly subdivides into: 


¢ Supervised learning: Given a set of inputs/independent variables/predictors x and outputs/dependent variables/ 
targets y, the goal is to learn a function f(x) that approximates y. This is accomplished by supervising f(x), that is, 
providing it with examples (x,, y,), ..., (x, ¥,) and feedback whenever it makes mistakes or accurate predictions. 


¢ Unsupervised learning: Given several objects/samples/transactions x.,,...,x,, the goal is to learn a hidden map A(x) 
that can uncover a hidden structure in the data. This hidden map can be used to ‘compress’ x (aka dimensionality 
reduction) or to assign to everyx, a group c, (aka clustering or topic modeling). 


* Reinforcement learning: Given an environment formed by several states s,,s,,..., 5, an agent, and a reward function, 
the goal is to learn a policy z that will guide an agent actions a, a,,..., a, through the state space so as to maximize 
occasional rewards. 


Figure 2 provides an illustration of these key learning paradigms. Suppose a database of financial reports is available. 
If some of them have been historically labeled as positive and negative, we can leverage this to automatically tag future 
documents. This can be accomplished by training a Learner in a supervised fashion. If these documents were unstructured, 
and spotting relations or topics is the goal (political events, economic data, etc.), a Learner trained in an unsupervised 
manner can help uncover these hidden structures. Also, these documents can characterize the current state of the 
insurance markets. Using that, a Learner can decide which actions should be taken in order to maximize profits, detect 
new risks, etc. By interacting and gaining feedback from the environment (Markets), the Learner can reinforce some 
behaviors so to avoid future losses or inaccurate decisions. 


Supervised Learning 
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Frat - - Learner — Sond — 


truth 


Positive 


Sraun 
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Action 
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Figure 2: Main learning paradigms of machine learning 


3.2. Deep learning, adversarial learning, transfer/meta learning 


These new forms of learning are ‘disrupting’ the current models prevalent in Supervised, Unsupervised and Reinforcement 
learning. They are not only powering new solutions and applications (e.g., driverless vehicles, smart-speakers, etc.) but 
they are making the resolution of previous problems cheaper, faster and more scalable. The second subdivision is: 


* Deep Learning — Deep learning algorithms attempt to model high-level abstractions in data by using multiple 
processing layers, with complex structures or otherwise, composed of multiple non-linear transformations. Hence, 
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the mapping function we are attempting to learn can be broken down into several compositional operations 
Sx) =f,-F,-F,. » £(x) Various deep learning architectures such as deep neural networks, convolutional deep neural 
networks, deep belief networks and recurrent neural networks have been applied to fields like computer vision, 
automatic speech recognition, natural language processing, audio recognition and bioinformatics where they have 
been shown to produce state-of-the-art results on various tasks (Goodfellow ef a/., 2016; Chollet, 2017). 


¢ Adversarial learning — Adversarial machine learning is a technique employed in the field of machine learning which 
attempts to ‘fool’ models through malicious input. More formally, assume a given input x associated to a label c and 
a machine learning model fsuch that f(x) = c, that is, fcan perfectly classify x. We consider x* an adversarial example 
if x* is indistinguishable from x and f(x) # c. Since they are automatically crafted, these adversarial examples tend to 
be misclassified more often than is true of examples which are perturbed by noise (Szegedy ef a/., 2013; Kurakin ef al., 
2016). Adversarial examples can be introduced during the training of models, making them more robust to attacks 


Learning Process of Traditional Machine Learning Learning Process of Transfer Learning 


Figure 3: Traditional vs transfer learning 


from adversarial agents. Typical applications involve increasing robustness in neural networks, spam filtering, 
information security applications, etc. (Huang ef a/., 2011). 


¢ Transfer Learning — Focuses on utilizing knowledge gained while solving one problem to solve related ones. A 
closely related technique is multitask learning framework which aims in learning multiple tasks at the same time, even 
when they are different. A common approach in to uncover the common (latent) features that can benefit each 
individual task. Figure 3 depicts the difference between traditional approach of building and training machine 
learning models and the methodology following transfer learning principles. Transfer learning is broadly used in 
classification, regression and clustering problems. Raina ef a/., (2006) and Dai ef a/., (2007) proposed transfer 
learning to learn text data across domains. 


Next we look at how ‘disruptive’ models will change the Insurance Industry. 


4. Algorithms in future insurance markets 


New Al algorithms are constantly emerging; examples include LSTMs —a type of deep recurrent neural network capable 
of learning arbitrary long-term dependencies; GANs — an architecture comprised of two networks, pitting one against 
the other (thus the ‘adversarial’); and Transfer and Meta learning families — paradigms to reuse experience gained by 
solving predecessor problems as well as fine-tuning them for unseen tasks. 


This section introduces LSTMs and GANs, their typical applications and potential uses across insurance markets. 


4.1. Long Short-Term Memory (LSTM) 


LSTM networks (Hochreiter and Schmidhuber, 2006) are a type of recurrent neural network (RNN) capable of learning 
order dependence in sequence prediction problems, by keeping information about past inputs for an amount of time that 
is not fixed a priori, but rather depends on its weights, number of stacked layers and on the input data. Whereas a simple 
feed-forward neural network treats its inputs as independent, an RNN uses previous input sources within the calculations 
to recognize a data’s sequential characteristics. Figure 4 illustrates the difference between a typical feed-forward, a 
recurrent neural and a LSTM network. 
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Figure 4: Feed-Forward, Recurrent and LSTM Networks 


More formally, we can express their distinctions by their mathematical steps to produce an output: 
¢  Feed-forward NN 
°  Y,=fx), 


with x, y, as input and output at time ¢ respectively, and f(x) = f.f,.f,-.-.f,(x) as similar as a deep neural network 
computation. 


. Basic RNN 
7 h, =flh,_, x) 
- y,=g(h) 


with g representing a mapping from the hidden state h, back to the ‘visible’ output state y at time ¢; this state can be 
broadly understood as a compressed representation of the sequence being historically observed so far. 


. LSIM 

. i=o(x,h,,) 

© f= o,f.) 

*  g=tanh(x,h,,) 

- 0 =o0(x,h,,) 

. ¢, at i,*g, 

. h=o,*tanh (c)) 

* yeah) 
with i, f, g, o, denoting the input, forget, cell and output gates, respectively, othe sigmoid function, tanh the hyperbolic 
tangent function, and * the Hadamard product. The input, forget, and output gates are responsible for the transfer of 
information across the architecture, whilst the cell c, accumulates the information processed across these gates. As their 
name imply, the input gate decides how much the time ¢ input and previous hidden state still matters for the current 


moment; the forget gate acts as a ‘reset’, zeroing the accumulated information stored in the cell; the output gate 
modulates what part of the current cell state make it to the final hidden state. 


Overall, basic RNNs are a network of neuron-like nodes organized into successive ‘layers.’ Each node in a given 
layer connected with a directed (one-way) connection to every other node in the next successive layer. In summary, 
sequential information (e.g., financial time series) is preserved in the recurrent network’s hidden state, which manages to 
span many time steps as it cascades forward to affect the processing of each new example. LSTMs are designed to 
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overcome one of the drawbacks to basic RNNs, called the vanishing gradient problem (Goodfellow e7 a/., 2016), in which 
performance of the neural network suffers because it can’t be trained properly. LSTM units categorize data into short- 
term and long-term memory cells. This enables identification of which data is important, should be remembered, and 
looped back into the network, and what data can be forgotten. 


Traditional use cases: The LSTM model has been found to be highly successful in many applications, such as 
unconstrained handwriting recognition (Graves e7 a/l., 2009), speech recognition (Graves ef a/., 2013) (Graves and Jaitly, 
2014), handwriting generation (Graves, 2014), machine translation (Sutskever e7 a/., 2014), image captioning (Kiros e/ al., 
2014) (Xu etal., 2016), and parsing (Vinyals et a/., 2015). 


Applications in insurance markets: Weishan et al. (2016) employed CNN (Convolutional Neural Network using 1-D 
Convolution and Pooling Convolutional neural network) for learning driving styles from the GPS data. Further, (Saleh e7 
al., 2017) utilized LSTM recurrent neural networks for classification of driving behavior based on sensor data from 
smartphones. (Diao and Wang, 2019) has explored a way to base insurance premium income on LSTM. 


4.2. Generative Adversarial Networks (GANs) 


GANs (Goodfellow e/ a/., 2014) is a modeling strategy that employs two Neural Networks: a Generator (G) and a 
Discriminator (D) — Figure 5. The G is responsible for producing produce a rich, high dimensional vector attempting to 
replicate a given data generation process; the D acts to separate the input created by the G and of the real/observed data 
generation process. They are trained jointly, with G benefiting from D 8 incapability to recognise true from generated 
data, whilst D loss is minimized when it is able to classify correctly inputs coming from G as fake and the dataset as true. 
Competition drive both networks to improve their performance until the genuine data is indistinguishable from the 
generated one. 


a a ce ere 


True = 1 


Discriminator 
Network 
D 


Fake =0 
Generator | — _L_L- , 


Network 
G 


4™ Dorior (Z) synthetic 


Figure 5: General scheme of a Generative Adversarial Network 


Traditional use cases: Overall, GANs have been successfully applied to image and text generation (Creswell ef al., 
2017); BigGAN is a very successful example of using GANSs to create high fidelity natural image synthesis and 
representation learning (Brock ef a/., 2018). 


Applications in insurance markets: Kuo (2019) utilized CTGAN architecture proposed by Xu et al. based on GAN for 
synthesizing insurance datasets. 


To safely adopt these disrupted techniques in a regulated environment, insurers need to focus on algorithm 
interpretability/explainability techniques which we will discuss in the next section. 
5. Interpretability/Explainability of AI algorithms 


In the context of AI and ML, Explainability and Interpretability are often used interchangeably. Algorithmic interpretability 
is about the extent to which a cause and effect can be observed within a system, and the extent an observer is able to 
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predict what will happen, for a given set of input or algorithm parameters. Algorithmic explainability is the extent to 
which the internal mechanics ofa ML (deep learning) system is explainable in human terms. In simple terms, Interpretability 
is about understanding the algorithm mechanics (without necessarily knowing why); explainability is being able to 
explain what is happening in the algorithm. 


There are multiple forms to generate and provide explanations based on an algorithmic decision-making system. 
Figure 6 presents the types and levels of explainability: model-specific and agnostic, global and local (Molnar, 2020). 
Below we unwrap these concepts, as well as outline some technical solutions: 


<P, 
*+ Local Interpretable ag 


Model-Agnostic Model- 
explanations (LIME) agnostic 


“+ Shapley values 
(SHAP) ** Partial Dependence 


“+ Feature Importance 
“+ Counterfactual 


77": explanations SED 
to} ep 
local global 
** Linear model + Linear model 
** Decision tree ** Decision tree 
** Rule-based system “+ Rule-based system 


Model-specific 


Figure 6: Types and levels of Explainability 


Model-specific: With model specific explainability, a model is designed and developed in such a way that it is fully 
transparent and explainable by design. In other words, an additional explainability technique is not required to be 
overlaid on the model in order to be able to fully explain its workings and outputs. In general, explainable models are 
simpler than non-explainable models and as such their performance in terms of accuracy is relatively diminished. 
Explainable models include linear regression, decision trees, k-nearest neighbors, and rule-based systems. 


Model-agnostic: With model-agnostic explainability, a mathematical technique is applied to the outputs of any algorithm 
including very complex and opaque models, in order to provide an interpretation of the decision drivers for those 
models. A few of the most popular approaches include Shapley Explanations (SHAP) (Lundberg and Su-In, 2017) and 
Local Interpretable Model-A gnostic Interpretation (LIME) (Ribeiro ef a/., 2016). However, a general limitation of all 
model-agnostic explainability techniques is that it entails running an additional model on top of an already complex 
model. The explainability technique will never be 100% accurate, and therefore a layer of additional inaccuracy is 
introduced into an already inaccurate model, and the output becomes one step further removed from the reality. 


Global: This facet focuses on understanding the algorithm’s behavior at a high/dataset/population level. The usual 
techniques to provide these explanations are Feature Importance and Partial Dependence. Overall, these methods 
quantify the weight of a feature in the model’s performance and predictions, usually by experimenting with small 
changes in the data. Apart from a few models (like Decision Trees), both techniques are computationally expensive; it 
will take time to vary each feature in order to approximate an accurate interpretation of the model, particularly with big 
datasets. The typical user of Feature Importance and Partial Dependence are researchers and designer of algorithms, 
since they tend to be more interested with the general insights and knowledge discovery that the model produces, rather 
than specific individual cases. 


Local: This facet focuses on understanding the algorithm’s behavior at a low/subset/individual level. A variety of 
methods have been developed in order to help to interpret as to why a model decided for a particular data point. Three 
of the most popular tools are: LIME; SHAP; Counterfactual explanations (CE) (Wachter ef a/., 2018). In a nutshell 
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(i) LIME: samples individual data points and weighs them according to similarity to the individual data point that is to be 
explained; (ii) SHAP: trains a model with each individual feature, computes the result, repeats with the other features, 
and then adds features one by one into the model in order to identify the true importance of each feature—this is usually 
approximated via Monte Carlo sampling; and (iii) CE: this is a computationally expensive technique which considers 
how the model would behave if some features had different values, allowing an explanation to be built up of individual 
decision factors, and enabling potential recourse and a more clear understanding by the individual under analysis. The 
typical user of local explanations are individuals being targeted by an algorithm, as well as members of the judiciary and 
regulators trying to make a case about potential discrimination. 


For other methods and models for Explainable AI we suggest the following reading (Table 5): 


Table 5: Explainable AI methods and models 


Method 


Description 


Reference 


Black Box Explanations 
through Transparent 


Approximations (BETA) 


Model agnostic framework for explaining the behavior of any black- 
box classifier by simultaneously optimizing for fidelity to the original 


model and interpretability of the explanation. 


(Lakkaraju et al., 2017) 


Layer-wise Relevance 


Propagation (LRP) 


Deep Taylor 


Decomposition 


Proposed methodology that allows to visualizing the contributions of 
single pixels to predictions for kernel-based classifiers over Bag of 


Words features and for multi-layered neural networks. 


The iterative application of Taylor decomposition from the top layer 


down to the input layer. 


(Bach et al., 2015) 


(Montavon et al., 2017) 


Prediction Difference 


Analysis 


Deep Generator 


Networks 


In classification problem, the method highlights areas in a given input 


image that provide evidence for or against a certain class. 


The algorithm: generates qualitatively state-of-the-art synthetic images 
that look almost real; reveals the features learned by each neuron in an 


interpretable way, 


(Zintgraf et al., 2017) 


(Nguyen et al., 2016) 


Testing with Concept 
Activation Vectors 


(TCAV) 


Describes how CAVs may be used to explore hypotheses and generate 
insights for a standard image classification network as well as a medical 


application. 


(Kim et al., 2018) 


Deep Visualization 


Recursive Neural 


Proposes a tool that visualizes the activations produced on each layer 
of a trained convolutional neural network as it processes an image or 
video (e.g. a live webcam stream). The second proposed tool enables 
visualizing features at each layer of a DNN via regularized optimization 


in image space. 


(Yosinski et al., 2015) 


Networks cell state 


analysis 


Analysis of their representations, predictions and error types of 
Recurrent Neural Networks (RNNs), and specifically a variant with 
Long Short-Term Memory (LSTM). 


(Karpathy et al., 2015) 


In the next section we will discuss challenges of AI algorithms such as algorithm selection, bias, fairness and 
robustness of algorithms. 


6. Governance of algorithms 


Here we use the term governance of algorithms as a ‘catch all’ for the rules, practices and processes by which an 
institution directs and controls algorithms and data. This is increasingly important since ML algorithms effectively self- 
program and evolve dynamically. Hence, financial institutions and regulators are becoming increasingly concerned with 
issues of Algorithmic /nterpretability/Explainability and Data governance. 
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6.1. Algorithm selection 


Users face a number of challenges when selecting algorithms for their application: 


Backtest overfitting — where many variations of a (trading) strategy are tried on the same dataset, and, as a result, 
strategies looking good on paper often perform poorly when presented with new data. Currently, there is an increasing 
quest for devising a set of procedures to deal with this issue; refer to (Koshiyama and Firoozye, 2019) for a review of 
the current literature and a few solutions to backtest overfitting. 


Feature engineering — is augmentation of data; the process of going from raw data to data that is ready for modeling. 
Strategies and associated algorithms include: (a) reduce data redundancy/dimensionality (e.g., PCA); (b) capturing 
complex relationships (e.g., NNs); and (c) rescaling variables (e.g., standardizing or normalizing), etc. 


Data scarcity — means too few data points (to train a model) often because it is difficult to get data or the data is small 
with respect to the amount needed. Whereas Data sparsity means data distributed sparsely over the available 
feature space. 


Data sensitivity — data owners need to contribute data to collaborative analytics, while not wishing to ‘share’ 
extremely valuable and sensitive raw data. An important solution discussed below is Federated Learning. 


Hyperparameter optimization — is the problem of choosing a set of optimal input variables (i.e., hyperparameters) for 
a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, in contrast 
to other parameters (typically node weights) that are ‘learnt’. 


Interpretability/Explainability — in machine learning explainability and interpretability are often used interchangeably, but: 
(a) Interpretability is about the extent to which a cause and effect can be observed within a system; and 
(b) Explainability is the extent to which the internal mechanics of an algorithm can be explained in human terms. 


Examples of potential solutions to model selection challenges are: 


Pre-trained models — a model created to solve a similar problem, often on a large data set, is used as a starting point 
instead of building a model from scratch. This is the basis of Transfer learning. 


AutoML — automated machine learning (AutoML) is the process of automating end-to-end the process of applying 
machine learning to real-world problems. Companies such as h20.ai, Datarobot, Amazon, etc. have created AutoML- 
like systems. 


6.2. Bias and fairness 


Al algorithms introduce a significant challenge when it comes to identify where and how algorithms may introduce bias 
into decision making process. Possible sources of bias: 


Training set — skewed sample that, e.g., underrepresents members of protected classes. 
Proxies — especially where proxies are hidden within other factors used in machine learning models. 
Data completeness — for minority groups’ data might be less informative or reliable. 


Unfairness can be mitigated at different points in a modeling pipeline: pre-processing, in-processing and post- 


processing. Table 6 presents different methodologies to mitigate bias in AI systems: 


Table 6: Modeling pipeline and different technical solutions for AI fairness 

Pipeline Technical solution 
Pre-processing 7 Reweighing subjects 

: Oversampling minority group 

: Disparate impact remover 

* Learning fair representations 
In-processing : Adversarial debiasing 

: Fairness constraint 

- Counterfactual fairness 
Post-processing : Calibrated equality of odds 

. Reject option classification 
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6.3. Robustness 


Algorithmic robustness is characterized by how effectively an algorithm can be deemed as safe and secure, not vulnerable 
to tampering or compromising of the data they are trained on. We can rate an algorithm’s robustness using four key 
criteria (EU-HLEG 2019): 


¢ Resilience to attack and security: AI systems, like all software systems, should be protected against vulnerabilities 
that can allow them to be exploited by adversaries, such as data poisoning, model leakage or the infrastructure, both 
software and hardware. This concept is linked with the mathematical concept of Adversarial Robustness (Carlini et 
al., 2019), that is, how would the algorithm performed in the worst-case scenario? (e.g., how the algorithm would 
react during the 2008 Financial Crisis?). 


¢ Fallback plan and general safety: AI systems should have safeguards that enable a fallback plan in case of problems. 
Also, the level of safety measures required depends on the magnitude of the risk posed by an AI system. This notion 
is strongly associated with the technical concept of Formal Verification (Qin et al., 2019), which in broad terms 
means: does the algorithm attends the problem specifications and constraints? (e.g., respect physical laws). 


¢ Accuracy: pertains to an AI system’s ability to make correct judgments, for example to correctly classify information 
into the proper categories, or its ability to make correct predictions, recommendations, or decisions based on data or 
models. Accuracy as a general concept can be quantified by estimating the Expected Generalization Performance 
(Arlot and Calisse, 2009), which means that in general, how well the algorithm works? (e.g., in 7 out of 10 cases, the 
algorithm makes the right decision). 


¢ Reliability and reproducibility: a reliable AI system is one that works properly with a range of inputs and in a range 
of situations, whilst reproducibility describes whether an AI experiment exhibits the same behavior when repeated 
under the same conditions. This idea is tied with the software engineering concept of Continuous Integration 
(Meyer, 2014) TI, that is, is the algorithm auditable? (e.g., reliably reproduce its decisions). 


In practice, each technical criteria embodies a number of technical solutions (Table 7). These technical solutions can aid 
the analyst in measuring and having systems in place to assess and make systems more robust before deployment stage. 


Table 7: Mapping technical criteria and solutions for algorithmic robustness 


Criteria Technical solution 
Expected generalization performance : Cross-validation: k-fold cv, bootstrap, etc. 
* Covariance-penalty: Mallow’s C Stein Unbiased Risk Estimator, 


bootstrap approximation, etc. 


Adversarial robustness - Evasion attacks: fast gradient sign method, DeepFool, etc. 

. Defence: label smoothing, variance minimization, etc. 
Formal verification . Complete: Satisfiability Modulo Theory, Mixed Prog., etc. 

‘ Incomplete: Propagating bounds, Convex Optimization, etc. 
Reliability and reproducibility . Code versioning: Git (Github), Mercurial (BitBucket), etc. 

. Reproducible analysis: Binder, Docker, etc. 

. Automated testing: Travis CI, Scrutinizer CI, etc. 
6.4. Risk 


Algorithms, especially so-called black box trading algorithms, amplifies systemic risk for a number of reasons: (a) 
Intensifying Volatility — algorithms can react instantaneously to market conditions and during volatile markets may 
greatly widen their bid-ask spreads, or temporarily stop trading thereby diminishing liquidity; (b) Flash crash — increased 
algorithm and market integration means a meltdown in a major market or asset class often has a ripple effect across other 
markets; (c) Uncertainty — algorithm opaqueness stokes investor uncertainty; (d) Rogue algorithms — due to speed and 
lack of transparency one errant or faulty algorithm can rack up millions in losses in a very short period (e.g., Knight 
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Capital lost $440 mn in a 45 min period on August 1, 2012); and (e) Algorithm uniformity —a lack of diversity in (trading) 
algorithms could reduce robustness in a market (cf. Irish potato famine). 


6.5. Legality and ethics 


Increasingly, ML algorithms self-program and evolve dynamically, raising concerns about explainability of financial 
decisions (e.g., for mortgages, loans); discriminatory, unethical and illegal behavior (e.g., CV/Resume ‘sifting’ recruitment 
systems); and unintentional fraudulent systems (e.g., systematic trading systems and market manipulation). Naturally, 
financial institutions and regulators are becoming increasingly concerned with ensuring there remains a modicum of 
human control. 


6.6. Compliance and regulation 


Compliance departments are increasingly using AI algorithms to automate procedures and monitor behavior of employees. 
Now they face the increasing challenge of self-programming algorithms being discriminatory, unethical and illegal; 
exposing the institutions to (unintentional) reputational damage, financial loss and potentially large fines. 


Financial regulators are also increasing using AI algorithms to automate monitoring and reporting (Treleaven e/ ai., 
2019). To underpin this automation and leverage AI algorithms, leading financial regulators are seeking to encode 
regulatory rules as computer-executable code, allowing compliance and regulation to be fully automated, and operate in 
real time and across multiple jurisdictions. 


Traditionally, regulators have faced the challenge of regulating institutions, individuals, and processing the ‘tsunami’ 
of reporting data. Going forward, regulators have the additional challenge of regulating algorithm behavior. 


6.7. Legal status of algorithms 


Finally, there is the growing discussion in the Judiciary concerning the ‘status of algorithms in Law’. In Law, as we know, 
companies have the rights and obligations of a person. Algorithms are rapidly emerging as artificial persons: a legal 
entity that is not a human being but for certain purposes is legally considered to be a natural person. The argument is 
that since algorithms are doing or intermediating business (agency) with humans, companies and even other algorithms 
they also need to have the status of an artificial person in Law. 


6.8. Alternative data 


However, although AI and algorithms receive all the publicity, many people believe that we have yet to experience the 
full extent of the so-called data revolution, and especially the use by the capital markets of alternative data. For example, 
investment funds are buying anonymized real-time credit card data, and therefore can ‘see’ what is going through a 
retailer’s tills and expansively across the whole industry sector. As a definition, alternative data (in finance) refers to data 
used to obtain insight into the investment process; sources such as financial transactions, retail data, sensors, mobile 
devices, satellites, public records, and the Internet. Surprisingly companies that produce alternative data (e.g., credit 
card, retail, telecoms, transportation, etc.) generally overlook the value of their data to financial institutions. Hence, 
these data sets are often less readily accessible and less well-structured than traditional sources of data. Refer to the 
(Denev and Amen, 2020) book for a comprehensive introduction and review in this topic. 


7. Conclusion 


This paper reviews AI, ML and associated algorithms, and discusses their future impact on the Insurance Markets. The 
data science technologies driving change include: Big data, AI analytics, Internet of Things, and Blockchain technologies. 
These technologies are important since they underpin the automation of the Insurance Markets and risk analysis, and 
provide the context for the algorithms, such as AI machine learning and computational statistics, which provide powerful 
analytics capabilities. 


The current main disrupting forms of learning include Deep Learning, Adversarial Learning, Federated Learning, 
Transfer and Meta Learning. These forms of learning have produced new models (e.g., LSTM, GANs) and leverage 
important applications (e.g., Natural Language Processing, Adversarial Examples, Deep Fakes, etc.) which we have 
discussed more in this review. Risk management, marketing, reserving, claims handling are only the examples of areas 
where AI will empower insurance. 


Companies also need to embrace automation with digital infrastructure: 


Data Management Solutions — Successful Insurance companies will collect increasing amounts of historic and real-time 
data (e.g., business, economic, social media, alternate) to drive business decisions. 
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Automated trading Platform — Digital marketplaces and platforms provide direct engagement with clients, support 
automation and drive down costs. 


Undoubtedly, data revolution will be fuelled by the upgrade of technology frameworks used across the Industry 
(e.g., Agile development, data lakes, microservices, Python, Cloud computing, Hadoop, NoSQL databases). 
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